我们呈现Nureality,一个虚拟现实'VR'环境,旨在测试车辆行为在城市交叉路口自主车辆和行人之间的相互作用中沟通意图的效果。在这个项目中,我们专注于表达行为作为行人的手段,即易于认识到AV运动的潜在意图。 VR是用于测试这些情况的理想工具,因为它可以被沉浸,并将受试者放入这些潜在的危险情景中而没有风险。 Nureality提供了一种新颖的和沉浸式虚拟现实环境,包括众多视觉细节(道路和建筑纹理,停放的汽车,摇曳的树肢)以及听觉细节(鸟儿唧唧喳喳,距离距离的汽车)。在这些文件中,我们呈现Nureality环境,其10个独特的车辆行为场景,以及每个场景的虚幻引擎和Autodesk Maya源文件。这些文件在www.nureality.org上公开发布为开源,以支持学术界,研究临界公平互动。
translated by 谷歌翻译
Algorithms that involve both forecasting and optimization are at the core of solutions to many difficult real-world problems, such as in supply chains (inventory optimization), traffic, and in the transition towards carbon-free energy generation in battery/load/production scheduling in sustainable energy systems. Typically, in these scenarios we want to solve an optimization problem that depends on unknown future values, which therefore need to be forecast. As both forecasting and optimization are difficult problems in their own right, relatively few research has been done in this area. This paper presents the findings of the ``IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling," held in 2021. We present a comparison and evaluation of the seven highest-ranked solutions in the competition, to provide researchers with a benchmark problem and to establish the state of the art for this benchmark, with the aim to foster and facilitate research in this area. The competition used data from the Monash Microgrid, as well as weather data and energy market data. It then focused on two main challenges: forecasting renewable energy production and demand, and obtaining an optimal schedule for the activities (lectures) and on-site batteries that lead to the lowest cost of energy. The most accurate forecasts were obtained by gradient-boosted tree and random forest models, and optimization was mostly performed using mixed integer linear and quadratic programming. The winning method predicted different scenarios and optimized over all scenarios jointly using a sample average approximation method.
translated by 谷歌翻译
Machine learning (ML) has recently facilitated many advances in solving problems related to many-body physical systems. Given the intrinsic quantum nature of these problems, it is natural to speculate that quantum-enhanced machine learning will enable us to unveil even greater details than we currently have. With this motivation, this paper examines a quantum machine learning approach based on shallow variational ansatz inspired by tensor networks for supervised learning tasks. In particular, we first look at the standard image classification tasks using the Fashion-MNIST dataset and study the effect of repeating tensor network layers on ansatz's expressibility and performance. Finally, we use this strategy to tackle the problem of quantum phase recognition for the transverse-field Ising and Heisenberg spin models in one and two dimensions, where we were able to reach $\geq 98\%$ test-set accuracies with both multi-scale entanglement renormalization ansatz (MERA) and tree tensor network (TTN) inspired parametrized quantum circuits.
translated by 谷歌翻译
Cloth in the real world is often crumpled, self-occluded, or folded in on itself such that key regions, such as corners, are not directly graspable, making manipulation difficult. We propose a system that leverages visual and tactile perception to unfold the cloth via grasping and sliding on edges. By doing so, the robot is able to grasp two adjacent corners, enabling subsequent manipulation tasks like folding or hanging. As components of this system, we develop tactile perception networks that classify whether an edge is grasped and estimate the pose of the edge. We use the edge classification network to supervise a visuotactile edge grasp affordance network that can grasp edges with a 90% success rate. Once an edge is grasped, we demonstrate that the robot can slide along the cloth to the adjacent corner using tactile pose estimation/control in real time. See http://nehasunil.com/visuotactile/visuotactile.html for videos.
translated by 谷歌翻译
There has been a recent explosion of impressive generative models that can produce high quality images (or videos) conditioned on text descriptions. However, all such approaches rely on conditional sentences that contain unambiguous descriptions of scenes and main actors in them. Therefore employing such models for more complex task of story visualization, where naturally references and co-references exist, and one requires to reason about when to maintain consistency of actors and backgrounds across frames/scenes, and when not to, based on story progression, remains a challenge. In this work, we address the aforementioned challenges and propose a novel autoregressive diffusion-based framework with a visual memory module that implicitly captures the actor and background context across the generated frames. Sentence-conditioned soft attention over the memories enables effective reference resolution and learns to maintain scene and actor consistency when needed. To validate the effectiveness of our approach, we extend the MUGEN dataset and introduce additional characters, backgrounds and referencing in multi-sentence storylines. Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, which are consistent with the story, but also models appropriate correspondences between the characters and the background.
translated by 谷歌翻译
Unhealthy dietary habits are considered as the primary cause of multiple chronic diseases such as obesity and diabetes. The automatic food intake monitoring system has the potential to improve the quality of life (QoF) of people with dietary related diseases through dietary assessment. In this work, we propose a novel contact-less radar-based food intake monitoring approach. Specifically, a Frequency Modulated Continuous Wave (FMCW) radar sensor is employed to recognize fine-grained eating and drinking gestures. The fine-grained eating/drinking gesture contains a series of movement from raising the hand to the mouth until putting away the hand from the mouth. A 3D temporal convolutional network (3D-TCN) is developed to detect and segment eating and drinking gestures in meal sessions by processing the Range-Doppler Cube (RD Cube). Unlike previous radar-based research, this work collects data in continuous meal sessions. We create a public dataset that contains 48 meal sessions (3121 eating gestures and 608 drinking gestures) from 48 participants with a total duration of 783 minutes. Four eating styles (fork & knife, chopsticks, spoon, hand) are included in this dataset. To validate the performance of the proposed approach, 8-fold cross validation method is applied. Experimental results show that our proposed 3D-TCN outperforms the model that combines a convolutional neural network and a long-short-term-memory network (CNN-LSTM), and also the CNN-Bidirectional LSTM model (CNN-BiLSTM) in eating and drinking gesture detection. The 3D-TCN model achieves a segmental F1-score of 0.887 and 0.844 for eating and drinking gestures, respectively. The results of the proposed approach indicate the feasibility of using radar for fine-grained eating and drinking gesture detection and segmentation in meal sessions.
translated by 谷歌翻译
Cement is the most used construction material. The performance of cement hydrate depends on the constituent phases, viz. alite, belite, aluminate, and ferrites present in the cement clinker, both qualitatively and quantitatively. Traditionally, clinker phases are analyzed from optical images relying on a domain expert and simple image processing techniques. However, the non-uniformity of the images, variations in the geometry and size of the phases, and variabilities in the experimental approaches and imaging methods make it challenging to obtain the phases. Here, we present a machine learning (ML) approach to detect clinker microstructure phases automatically. To this extent, we create the first annotated dataset of cement clinker by segmenting alite and belite particles. Further, we use supervised ML methods to train models for identifying alite and belite regions. Specifically, we finetune the image detection and segmentation model Detectron-2 on the cement microstructure to develop a model for detecting the cement phases, namely, Cementron. We demonstrate that Cementron, trained only on literature data, works remarkably well on new images obtained from our experiments, demonstrating its generalizability. We make Cementron available for public use.
translated by 谷歌翻译
Signature-based malware detectors have proven to be insufficient as even a small change in malignant executable code can bypass these signature-based detectors. Many machine learning-based models have been proposed to efficiently detect a wide variety of malware. Many of these models are found to be susceptible to adversarial attacks - attacks that work by generating intentionally designed inputs that can force these models to misclassify. Our work aims to explore vulnerabilities in the current state of the art malware detectors to adversarial attacks. We train a Transformers-based malware detector, carry out adversarial attacks resulting in a misclassification rate of 23.9% and propose defenses that reduce this misclassification rate to half. An implementation of our work can be found at https://github.com/yashjakhotiya/Adversarial-Attacks-On-Transformers.
translated by 谷歌翻译
现代社会有兴趣由于复杂的相机的激增而捕获高分辨率和优质图像。但是,如果在计算机视觉任务中使用了此类图像,则图像中的噪声污染不仅较低,而且相反会影响随后的过程,例如遥感,对象跟踪等。高分辨率图像的时间处理受图像捕获仪器的硬件限制的限制。 Geodesic Gramian denoising(GGD)是一种基于多种噪声滤波方法,我们在过去的研究中介绍了该方法,它利用了Geodesics的Gramian Gramian矩阵的一些突出的奇异向量进行噪声滤波过程。 GDD遇到$ \ MATHCAL {O}(n^6)$时,GDD的适用性受到限制^2 $数据矩阵由单数值分解(SVD)实现。在这项研究中,我们通过用四种不同的单数矢量近似技术代替其SVD步骤来提高GGD框架的效率。在这里,我们比较集成到GGD中的四个技术之间的计算时间和噪声过滤性能。
translated by 谷歌翻译
人类在整个生命周期中不断学习,通过积累多样化的知识并为未来的任务进行微调。当出现类似目标时,神经网络会遭受灾难性忘记,在学习过程中跨顺序任务跨好任务的数据分布是否不固定。解决此类持续学习(CL)问题的有效方法是使用超网络为目标网络生成任务依赖权重。但是,现有基于超网的方法的持续学习性能受到整个层之间权重的独立性的假设,以维持参数效率。为了解决这一限制,我们提出了一种新颖的方法,该方法使用依赖关系保留超网络来为目标网络生成权重,同时还保持参数效率。我们建议使用基于复发的神经网络(RNN)的超网络,该网络可以有效地生成层权重,同时允许在它们的依赖关系中。此外,我们为基于RNN的超网络提出了新颖的正则化和网络增长技术,以进一步提高持续的学习绩效。为了证明所提出的方法的有效性,我们对几个图像分类持续学习任务和设置进行了实验。我们发现,基于RNN HyperNetworks的建议方法在所有这些CL设置和任务中都优于基准。
translated by 谷歌翻译